A 13-gene signature to predict the prognosis and immunotherapy responses of lung squamous cell carcinoma

Lung squamous cell carcinoma (LUSC) comprises 20–30% of all lung cancers. Immunotherapy has significantly improved the prognosis of LUSC patients; however, only a small subset of patients responds to the treatment. Therefore, we aimed to develop a novel multi-gene signature associated with the immune phenotype of the tumor microenvironment for LUSC prognosis prediction. We stratified the LUSC patients from The Cancer Genome Atlas dataset into hot and cold tumor according to a combination of infiltration status of immune cells and PD-L1 expression level. Kaplan–Meier analysis showed that hot tumors were associated with shorter overall survival (OS). Enrichment analyses of differentially expressed genes (DEGs) between the hot and cold tumors suggested that hot tumors potentially have a higher immune response ratio to immunotherapy than cold tumors. Subsequently, hub genes based on the DEGs were identified and protein–protein interactions were constructed. Finally, we established an immune-related 13-gene signature based on the hub genes using the least absolute shrinkage and selection operator feature selection and multivariate cox regression analysis. This gene signature divided LUSC patients into high-risk and low-risk groups and the former inclined worse OS than the latter. Multivariate cox proportional hazard regression analysis showed that the risk model constructed by the 13 prognostic genes was an independent risk factor for prognosis. Receiver operating characteristic curve analysis showed a moderate predictive accuracy for 1-, 3- and 5-year OS. The 13-gene signature also performed well in four external cohorts (three LUSC and one melanoma cohorts) from Gene Expression Omnibus. Overall, in this study, we established a reliable immune-related 13-gene signature that can stratify and predict the prognosis of LUSC patients, which might serve clinical use of immunotherapy.

www.nature.com/scientificreports/ tumor purity 16 . Stromal and immune scores were used to predict the level of infiltrating stromal and immune cells. ESTIMATE score based on the immune score and stromal scores were further used to infer tumor purity in tumor tissue 16 . CIBERSORT algorithm was performed to calculate the abundance of 22 types of immune cell subsets in each sample 17 . Identification of differentially expressed genes (DEGs) and enrichment analysis. The R package "DESeq2" 18 was applied to identify DEGs (p.adjust value < 0.05 and |log2FC|> 1) between the hot and cold tumors. Afterward, the DEGs were used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes www.nature.com/scientificreports/ (KEGG) analyses by using the R package "clusterProfiler" 19 . All genes were arranged into a ranked list according to the fold change and then used to conduct a GSEA analysis. For the target set of GSEA analysis requirements, we obtained hallmark gene sets (h.all.v7.4.entrez.gmt) from Molecular Signatures Database (MSigDB). Pathways with p.adjust value < 0.05 were selected.

Protein protein interaction (PPI) network and hub genes identification.
To further investigate the interactions between the DEGs, a PPI network was constructed using the STRING (https:// string-db. org/) database and interaction scores > 0.4 were considered statistically significant. The MCODE was a tool to detect densely connected regions in large protein-protein interaction networks that might represent molecular complexes 20 , which was used to extract key sub-networks and identify hub genes in the network. Enrichment analyses were performed on the hub genes. Subsequently, the CytoHubba 21 was used to obtain the top 10 hub genes in hot and cold tumors, respectively. All parameters were default values and all the above networks were visualized in Cytoscape software (v3.8.2).
Screening of prognostic multi-gene signature. The univariate Cox regression analysis was used to obtain prognostic genes in the hub genes. The prognostic 13-gene signature was established by the multivariate cox analysis and the least absolute shrinkage and selection operator (LASSO) analysis 22 . We evaluated the risk score of each patient by the below formula: The Kaplan-Meier survival curve was used validate the prognostic value of the 13-gene signature. Receiver operating characteristic (ROC) curve analysis was used to estimate the sensitivity and specificity of the 13-gene signature. Moreover, to demonstrate that the risk score was an independent prognostic factor, we conducted univariate and multivariate Cox regression analyses to examine the prognostic value of the risk score and other clinical indicators in the TCGA-LUSC patients.
Statistical analysis. The Wilcoxon test was used to compare the differences in the proportion of immune cells, ICMs expression levels and ESTIMATE scores between the hot and cold tumors. For the OS analysis, the Kaplan-Meier curve and the two-sided log-rank test were performed. R package v4.0.2 was performed for all analyses and p < 0.05 was considered as statistical significance.

Stratification of hot and cold tumors.
To construct a multi-gene signature for predicting OS of LUSC patients, we designed and processed our study as shown in the flow chart ( Fig. 1). Hot tumors are supposed to have a relatively higher immune infiltration and are thus more likely to respond to immunotherapy compared with cold tumors 7,8 . PD-L1 is a co-inhibitory ICM that contributes to the immune escape of cancer cells 5 . LUSC patients with an upregulated PD-L1 expression are more likely to benefit from immunotherapy 23 . Five hundred and four LUSC-TCGA tumor samples (Table 1) were divided into two groups: hot and cold tumors, when a combination of immune infiltration scores and the PD-L1 expression level was used as a cutoff. Tumors responding to immunotherapy had the top 50% immune infiltrates scores and the top 50% PD-L1 expression were referred to as 'hot tumors' , whereas the rest tumors were 'cold tumors' . Kaplan-Meier analysis demonstrated that patients with hot tumors had a significantly shorter OS than patients with cold tumors (Fig. 2a). We subsequently compared immune cell infiltration levels between the hot and cold tumors using ESTIMATE (Fig. 2b), the expression level of co-stimulatory ICMs and CIBERSORT (Fig. 2c,d). Most of the co-stimulatory ICMs were upregulated in hot tumors (Fig. 2c). Immune cells were more infiltrated in hot tumors than cold tumors ( Fig. 2b-d). These results together indicated that hot tumors were more likely to respond to immunotherapy than cold tumors. Enrichment analyses of the DEGs. 1203 DEGs (including 564 upregulated DEGs and 639 downregulated DEGs) were identified in the hot compared with cold tumors (Fig. 3a). To gain a functional understanding of the DEGs, we conducted GO (Fig. 3b) and KEGG (Fig. 3c) analyses on the 1203 DEGs. We also performed a GSEA analysis (Fig. 3d) based on the rank information of all genes. The GO biological process (BP) mainly included the proliferation and regulation of multi-immune cells (Fig. 3b). The most abundant GO molecular function (MF) was immune receptor activity (Fig. 3b). GO cellular component (CC) was enriched in 'neuroactive ligand-receptor interaction' and 'metabolism of xenobiotics by cytochrome P450' (Fig. 3b). Pathway enrichment analysis regarding KEGG focused on 'cytokine-cytokine receptor interaction' , 'cytokine signaling pathway' , 'antigen processing and presentation' and 'natural killer cell-mediated cytotoxicity' (Fig. 3c). The GESA was enriched in the hallmark gene sets of 'interferon α/γ response' , 'inflammatory response' , 'IL5-STAT5 signaling' , 'IL6-STAT3 signaling' and 'TNF-α signaling via NF-KB' (Fig. 3d).

Hub genes and protein-protein interactions (PPIs).
To further explore the functions of DEGs, we conducted a KEGG analysis in hot and cold tumors, respectively. When compared with the cold tumor, the upregulated DEGs in the hot tumors were mainly enriched in immune-related pathways (Fig. 4a) and the downregulated DEGs in the hot tumors were mainly enriched in metabolic pathways (Fig. 4b). To further explore the interaction between the DEGs, we constructed PPI networks by STRING in hot and cold tumors, respectively. Afterward, 337 hub genes based on the 1203 DEGs were identified through MCODE in Cytoscape software. The hub genes have relatively higher intro module connectivity and gene significance than the other genes and play key roles in pathways in the co-expression network. The top 10 hub genes in hot and cold tumors were filtered www.nature.com/scientificreports/ into the PPI network ( Fig. 4c,d). The common feature of the top hub genes (for instance, IL17A, CD28, CD80 and CD40LG) in hot tumors was that they were involved in the immune activation process directly or indirectly (Fig. 4c). The top hub genes in cold tumors were keratin (KRT) family members, which were not so closely related to immune responses (Fig. 4d).

Identification of an immune-related 13-genes signature.
To estimate the value of the 337 hub genes in predicting OS in LUSC, the TCGA-LUSC dataset was used as a training cohort. The univariate Cox regression analysis was used to obtain prognostic genes in the above hub genes. Subsequently, the LASSO and multivariate cox regression analyses were performed to identify prognostic genes with the strongest predicting ability in the training cohort ( Fig. 5a Supplementary Table S1. By the median risk score, LUSC patients were divided into high-and low-risk groups and Kaplan-Meier curve showed that poor OS outcomes of LUSC patients were associated with the high-risk scores (Fig. 6a). According to the risk scores, the LUSC patients were ranked from left to right shown in the upper panel of Fig. 6b. The risk scores increased from left to right. OS distribution of each patient was shown in the lower panel of Fig. 6b where LUSC patients were ranked from left to right according to risk scores. A ROC curve was constructed to analyze the diagnostic accuracy of the 13-gene signature. It revealed that the 13-gene signature could serve as valuable biomarker for distinguishing between LUSC and control subjects (for 1-year, areas under the curve (AUCs) = 0.70; for 3-year, AUC = 0.76; for 5-year, AUC = 0.76) (Fig. 6c). To determine if the 13-gene signature was an independent prognostic marker, univariate and multivariate cox regression analyses were performed on the TCGA-LUSC dataset. The risk score of the 13-gene signature and other clinico-pathological factors, including gender, age, neoplasm cancer status, stage and smoking status were used as covariates in the cox regression analysis. We found a significant association between the 13-gene signature and OS in the TCGA dataset (HR = 1.5893, p < 0.0001). Our results showed that this 13-gene signature was an independent risk factor for predicting the OS of LUSC patients (Fig. 6d). The detailed univariate and multivariate cox analyses of the 13-gene signature and other clinico-pathological factors were showed in Table 2. External four GEO-LUSC datasets (GSE30219, GSE12472, GSE157011 and GSE78220) were utilized to validate the prediction power of the 13-gene signature, www.nature.com/scientificreports/ of which GSE78220 is a melanoma immunotherapy dataset. In line with the results in the training cohort (TCGA dataset), the Kaplan-Meier curve indicated that the risk scores could distinguish the patients well in the GEO datasets (Fig. 7); LUSC patients with low-risk scores demonstrated a significantly longer OS in the three validation cohorts. Based on these results, the 13-gene signature performed well in predicting OS of LUSC patients and could potentially guide the clinical management.

Discussions
LUSC comprises about 20-30% of all lung cancers 27 . Its clinical outcome has been poor for the past decades, because of a limited treatment strategy. The situation has dramatically changed mainly with the clinical introduction of ICIs-based immunotherapy in recent years; however, it is quite clear that only a subgroup of LUSC patients achieves sustained benefits from ICIs-based immunotherapy. At present, various LUSC outcomes have been identified in patients with similar clinical and pathological features, suggesting that the current clinical prognostic factors used may be insufficient to consistently predict individual clinical outcomes. Predictive indicators are the most important to choose rational treatment. Identifying reliable prognostic markers with higher sensitivity and accuracy in LUSC is in urgent need. Molecular markers on tumor have been extensively www.nature.com/scientificreports/ investigated for prognosis and guidance of cancer therapy; much less for tumor-associated immune cells in TME. TME is an environment where tumors are considered as complex dynamic tissues with an important interplay of various cells including tumor-infiltrating immune cells, which is crucial for identifying effective biomarkers for predicting drug resistance and cancer progression 28 . Many studies have demonstrated that T cells are the major immune cells infiltrating tumors in TME and the degree of lymphocytic infiltration is positively associated with an absence of tumor metastases 29 . ICMs are indispensable for the full activation of T cells. PD-L1, as a major ICM, is expressed on the cell surface in tumor-associated immune cells and various cancer cells 5 . Although the expression of PD-L1 has been widely evaluated in ICI-based immunotherapies as a positive predictive marker, it is still an imperfect predictive biomarker 2,30 . In our study, for the first time, LUSC patients distributed in hot and cold tumors were characterized by a combination of immune cell infiltration and PD-L1 expression associated with TME. Tumors responding to immunotherapy had a higher level of PD-L1 expression (top 50%) and immune infiltrates scores (top 50%) were referred to as 'hot tumors' , whereas the rest of the tumors were 'cold tumors' . Hot and cold tumors defined by our method predicted well in OS of LUSC patients. ESTIMATE, the expression level of co-stimulatory ICM, CIBER-SORT and enrichment analyses all suggested that the hot tumors potentially had a higher immune response to immunotherapy than cold tumors, which further approved our stratification of tumors. Recently, this unofficial classification of tumors into two categories, 'hot tumors' and 'cold tumors' , has been increasingly advocated. This immune-based, rather than tumor-based patient classification according to tumor immune infiltration, has shown a greater relative prognostic value than the traditional AJCC/UICC-TNM stratification system 12,31,32 . Different classifications of tumors represent various responses to immunotherapeutic options. Hot tumors are www.nature.com/scientificreports/ more likely to benefit from immunotherapy. Our stratification of LUSC patients contribute to dichotomizing tumors and can ultimately contribute to converting cold tumors to hot tumor. Hot tumors showed remarked differences in hub genes profile from the cold tumors. We found that the top hub genes of hot tumors comprised many immune-related genes (for instance, IL17A, CD28, CD80 and CD40LG). Co-stimulatory ICMs CD28, CD80 and CD40LG are secondary signal molecules in the T lymphocyte activation, which activate patients' anti-tumor immune responses, leading to increased efficacy of cancer immunotherapy 5 . CD28 is associated with an abundance of lymphocytes and longer OS in lung adenocarcinoma (LUAD) 33 . CD80 activates effector T cells via interacting with the receptors CD28 on the surface of the T cells. Upregulated CD80 predicts good prognosis in gastric adenocarcinoma 34 and oral squamous cell carcinoma 35 . Expression of CD40LG in the tumor-free lymph node is positively related to a good prognosis in oral squamous cell carcinoma 35 . We also observed that all top hub genes in cold tumors were keratin family members (for instance, KRT20, KRT12 and KRT4). Keratins are expressed in highly specific patterns correlated to the epithelial type and stage of cellular differentiation. Characteristic expression patterns of keratins are also observed in cancers 36 . Moreover, keratins are diagnostic and prognostic markers in epithelial cancers. For example, downregulated hub gene KRT20 indicates poor patient outcomes in colorectal cancer 37 , pancreatic adenocarcinomas 38-40   www.nature.com/scientificreports/ of stressful conditions including death receptor activation and drugs 42,43 . Further studies are required to identify the mechanisms of keratins explaining this possible decreased susceptibility and identifying prognostic markers in immunotherapy of LUSC. A gene signature predicting the prognosis of a large cohort of cancer patients is of great significance, because the gene expression can capture the influence of the changes of multiple genes at the same time and summarize the prognosis of multiple 'conventional' risk factors into one risk score 44,45 . Although gene expressions currently are not involved in the standard diagnosis of LUSC, it is proved to be a comprehensive tool for predicting outcomes in many cancers. For instance, a 3-gene signature has been proved to be a comprehensive tool for leukemia diagnosis and classification due to its high accuracy in all clinically relevant leukemia sub-entity   46 . In this study, we applied multi-cox regression analysis and LASSO feature selection to screen a 13-gene signature among 337 hub genes. In TCGA and three GEO cohorts validation, the 13-gene signature significantly stratified patients into high-vs low-risk groups in terms of OS and remained as an independent prognostic factor in multivariate analysis. Among the 13 prognostic genes, CD1E, KLRC2 and GAGE2A are more relevant to tumor immunity. CD1E is an MHC class I-like molecule that presents antigens to T cells and thus regulates T cells participation in the immune response. CD1E can predict the efficacy of immunotherapy in patients with nonmuscle-invasive bladder cancer 47 and glioblastomas 48 . KLRC is expressed primarily in natural killer (NK) cells. Tumor infiltration of NK cells is correlated with the prolonged survival of cancer patients. Either acute exercise or in vitro expansion of KLRC+/NKG2A− NK cells can enhance the anti-tumor cytotoxicity of NK cells for immunotherapy 49 .
In ovarian cancer, tumor-specific antigen GAGE2A can be used as an indicator for early diagnosis, efficacy evaluation and prognostic determination 50 .
In conclusion, for the first time, by dividing tumors into hot and cold tumors according to a combination of their immune infiltration and PD-L1 expression, this study proposed an immune-based rather than a tumorbased classification specifically for LUSC. Moreover, an immune-related 13-gene prognostic signature was developed and validated for prognosis prediction in LUSC through multi-step bioinformatics. This signature was strongly associated with OS in LUSC patients and might serve as a potential prognostic biomarker for clinical use of immunotherapy in the future. Prospective studies are needed to test the clinical utility of the signature for effective treatment strategies and personalized therapies of LUSC.